Goto

Collaborating Authors

 tz 0


Material

Neural Information Processing Systems

In what follows, we give some details of content omitted in the paper due to space limit. The supplements are organized as follows. We give some proof of Lemma 1, 2, Proposition 1, Lemma 3, 4, and Theorem 2 in Section A.1 -A.6, respectively. We provide some training details in Section A.13 as well as experiment details and results in Section A.14. We compare polynomial CBFs with NCBF in A.15, compare NCBFs with different activation functions in A.16. A.1 Proof of Lemma 1 We prove by induction on L. If L =1, then x 2 X(S) if the pre-activation input to the (1,j) neuron is nonnegative for all j 2 S1 and nonpositive for all j/2 S1. We have that the pre-activation input is equal to WT1jx+r1j, establishing the result for L =1 .


9a6b278218966499194491f55ccf8b75-Supplemental-Conference.pdf

Neural Information Processing Systems

The unit ℓ2-spherein d-dimensions that is centered at the origin is denoted bySd 1. Additionally, given a pair of symmetric matricesA,B Rd, we write A B if and only if x (A B)x 0, x Rd. More linear algebra facts appear in AppendixE. Let V P be a subset of distributions indexed by the points in the hypercubeEd = { 1,1}d. For a number of facts from probability and statistics (both related and unrelated to exponential families),wereferthereadertoAppendixF.


ClipRover: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots

arXiv.org Artificial Intelligence

Vision-language navigation (VLN) has emerged as a promising paradigm, enabling mobile robots to perform zero-shot inference and execute tasks without specific pre-programming. However, current systems often separate map exploration and path planning, with exploration relying on inefficient algorithms due to limited (partially observed) environmental information. In this paper, we present a novel navigation pipeline named ''ClipRover'' for simultaneous exploration and target discovery in unknown environments, leveraging the capabilities of a vision-language model named CLIP. Our approach requires only monocular vision and operates without any prior map or knowledge about the target. For comprehensive evaluations, we design the functional prototype of a UGV (unmanned ground vehicle) system named ''Rover Master'', a customized platform for general-purpose VLN tasks. We integrate and deploy the ClipRover pipeline on Rover Master to evaluate its throughput, obstacle avoidance capability, and trajectory performance across various real-world scenarios. Experimental results demonstrate that ClipRover consistently outperforms traditional map traversal algorithms and achieves performance comparable to path-planning methods that depend on prior map and target knowledge. Notably, ClipRover offers real-time active navigation without requiring pre-captured candidate images or pre-built node graphs, addressing key limitations of existing VLN pipelines.